Skip to content

Conversation

@ilicmarkodb
Copy link
Contributor

@ilicmarkodb ilicmarkodb commented Oct 17, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Added e2e tests for writing tables with collated columns. To enable writing collated data, the schema comparison had to be modified in some places.

How was this patch tested?

New tests.

Does this PR introduce any user-facing changes?

Yes, users can now create and write to Delta tables with collated types.

@ilicmarkodb ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch 5 times, most recently from 53c0334 to 9e24ec3 Compare October 22, 2025 22:14
@ilicmarkodb ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch 2 times, most recently from c05daf9 to 13e75c4 Compare October 23, 2025 17:02
@ilicmarkodb ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch 11 times, most recently from f8839d0 to e95e836 Compare October 23, 2025 20:21
@ilicmarkodb ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch from e95e836 to bc298a1 Compare October 23, 2025 20:42
Copy link
Collaborator

@allisonport-db allisonport-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests and approach LGTM, just had some concerns about how we compare throughout the code-base

return equals(dataType);
}

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having this + the other equivalent method or might they satisfy the same goal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think equivalentIgnoreCollations is needed because of the cases like https://github.com/delta-io/delta/pull/5357/files#diff-c9fd0f3e881617ea0f2439a29c32a35e8c32fbcdda229105be76ece4001819acR200. Here we don't to ignore names and metadata.


ColumnarBatch data = filteredBatch.getData();
if (!data.getSchema().equals(tableSchema)) {
if (!data.getSchema().equivalentIgnoreCollations(tableSchema)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a similar comment on the other PR, but how do we know when/where we should use this instead? How can we be sure all the instances of comparison we do throughout the code-base won't now be an issue? concerned there might be somewhere that would fail only if a test had collations in the schema, which the majority of existing tests won't

It seems like we need to audit everywhere we might compare data types or schemas?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried we do this type of comparison all over..

@ilicmarkodb ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch from c165c05 to bb11dc1 Compare October 24, 2025 13:57
@ilicmarkodb ilicmarkodb force-pushed the add_e2e_tests_for_collation_write branch from bb11dc1 to 114903b Compare October 24, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants